Iris Data Principal Components Analysis

This is a built-in data set for R from Fisher and Anderson. It contains four measurements (cols 1-4) for 50 samples each of three different iris spp. (col 5). Let’s run PCA on the data and see if we can visualize view it in a 2D plot:

data('iris')
head(iris) # see what it looks like
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
iris.x = iris[,1:4] # measurements
iris.y = iris[,5] # our response labels

iris.pca = prcomp(iris.x, scale=TRUE, center=TRUE)
summary(iris.pca)
## Importance of components:
##                           PC1    PC2     PC3     PC4
## Standard deviation     1.7084 0.9560 0.38309 0.14393
## Proportion of Variance 0.7296 0.2285 0.03669 0.00518
## Cumulative Proportion  0.7296 0.9581 0.99482 1.00000
plot(iris.pca,main="Proportion of Variance for PCs 1-4",xlab="PCs 1-4")

It looks like the cumulative variance for PC1 and PC2 is greater than 95%. Let’s use these to make our plot.

plot(iris.pca$x[,1], iris.pca$x[,2], col=iris.y, main="2D PCA Plot of Iris Data",xlab="PC1 (Variance 73%)",ylab="PC2 (Variance 23%)")

Plotly 2D

We can see the three species cluster above, but wouldn’t it be nice to hover over a point and learn more about it? We can do this with Plotly!

## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

Challenge: Now try to make a 3D Plot!

library(plotly)

# We'll add the 3rd PC to the data frame
iris.pca.df = data.frame("PC1"=as.matrix(iris.pca$x[,1]), "PC2"=iris.pca$x[,2], "PC3"=iris.pca$x[,3], "Species"=iris.y)

# YOUR CODE HERE!